AITopics | policy vector

Collaborating Authors

policy vector

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Transfer in reinforcement learning aims at solving a new target task with no additional learning or sample-efficiently by exploiting agents and information obtained from source tasks. We review a line of research with relevant approaches. This group of approaches reuses policies learned on source tasks for target tasks. Fernández and Veloso [17] suggest an exploration strategy for the learning of a new policy given a new task and learned source policies, where the gain of using each policy is estimated together on-line and one of the policies in the set is selected probabilistically at each step, based on the gain, but they focus on aiding the training of the target policy with samples from the target task rather than improving the zero-shot transfer performance. On the other hand, Dayan [14] introduce successor representations (SRs), state space occupancy representations disentangled from rewards, which allow linear decomposition of value functions.

large language model, machine learning, target task, (21 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

This group of approaches reuses policies learned on source tasks for target tasks. There is a series of studies that directly exploits the smoothness ofoptimal valuesacross taskswithfunction approximators. Figure 9: The performance profiles [2, 15] of the inference with GPI and constrained GPI on Reacher. For its use in the zero-shot transfer problem, we first set four fixed goal locations at (0.1,0.0),(0.0,0.1),( Our first observation is that while the transferred agents perform comparably on some tasks, constrained GPI makes significant differences on the others, especially more on the "Harsh" target tasks with many 1's as elements in their task vectors.

approximator, artificial intelligence, reacher, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.36)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Learning to Plan via a Multi-Step Policy Regression Method

Wagner, Stefan, Janschek, Michael, Uelwer, Tobias, Harmeling, Stefan

arXiv.org Artificial IntelligenceJun-18-2021

We propose a new approach to increase inference performance in environments that require a specific sequence of actions in order to be solved. This is for example the case for maze environments where ideally an optimal path is determined. Instead of learning a policy for a single step, we want to learn a policy that can predict n actions in advance. Our proposed method called policy horizon regression (PHR) uses knowledge of the environment sampled by A2C to learn an n dimensional policy vector in a policy distillation setup which yields n sequential actions per observation. We test our method on the MiniGrid and Pong environments and show drastic speedup during inference time by successfully predicting sequences of actions on a single observation.

agent, policy vector, teacher policy, (13 more...)

arXiv.org Artificial Intelligence

2106.10075

Country: Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.04)

Genre:

Research Report (0.64)
Workflow (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.65)

Add feedback

Efficient Decoupled Neural Architecture Search by Structure and Operation Sampling

Lee, Heung-Chang, Kim, Do-Guk, Han, Bohyung

arXiv.org Machine LearningOct-23-2019

We propose a novel neural architecture search algorithm via reinforcement learning by decoupling structure and operation search processes. Our approach samples candidate models from the multinomial distribution on the policy vectors defined on the two search spaces independently. The proposed technique improves the efficiency of architecture search process significantly compared to the conventional methods based on reinforcement learning with the RNN controllers while achieving competitive accuracy and model size in target tasks. Our policy vectors are easily interpretable throughout the training procedure, which allows to analyze the search progress and the discovered architectures; the black-box characteristics of the RNN controllers hamper understanding training progress in terms of policy parameter updates. Our experiments demonstrate outstanding performance compared to the state-of-the-art methods with a fraction of search cost.

architecture, architecture search, policy vector, (14 more...)

arXiv.org Machine Learning

1910.10397

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology: